Skip to main content

Worldwide Web Crawls

Wide crawls of the Internet conducted by Internet Archive. Please visit the Wayback Machine to explore archived web sites.

Since September 10th, 2010, the Internet Archive has been running Worldwide Web Crawls of the global web, capturing web elements, pages, sites and parts of sites.

Each Worldwide Web Crawl was initiated from one or more lists of URLs that are known as "Seed Lists". Descriptions of the Seed Lists associated with each crawl may be provided as part of the metadata for each Crawl.

Worldwide Web Crawls are run using the Heritrix software.

In addition various rules are also applied to the logic of each crawl. Those rules define things like the depth the crawler will try to reach for each host (website) it finds. In general terms the crawling software will identify all the URLs on each page it captures, follow those links, attempt to capture those pages, identify new URLs, follow those links, etc., till the crawl is stopped or pre-set conditions like site depth limits are reached. For the most part a given host will only be captured once per Worldwide Web Crawl, however it might be captured more frequently (e.g. once per hour for various news sites) via other crawls.

451,957
RESULTS
rss


PART OF
Internet Archive Web Crawls
Web Crawls
Media Type
18
collections
451,936
web
3
data
Topics & Subjects
451,936
crawldata
252
amazonbooks
Collection
451,957
Web Crawls
451,957
Internet Archive Web Crawls
451,957
Worldwide Web Crawls
71,730
Wide Crawl Number 14 started March 2016
49,621
Wide Crawl Number 12 - started March, 14th 2015
46,049
Wide Crawl Number 13
More right-solid
Creator
451,604
internet archive
3
lekash@archive.org
Language
3
English
SHOW DETAILS
up-solid down-solid
eye
Title
Date Archived
Creator
Wide Crawl started April 2013
collection
25,005
ITEMS
427.1M
VIEWS
collection
eye 427.1M
Web wide crawl with initial seedlist and crawler configuration from April 2013.
Wide Crawl started June 2014
collection
45,313
ITEMS
295.1M
VIEWS
collection
eye 295.1M
Web wide crawl with initial seedlist and crawler configuration from June 2014.
Wide Crawl started August 2013
collection
21,911
ITEMS
266.3M
VIEWS
collection
eye 266.3M
Web wide crawl with initial seedlist and crawler configuration from August 2013.
Wide Crawl Number 12 - started March, 14th 2015
collection
49,621
ITEMS
260.6M
VIEWS
collection
eye 260.6M
Web wide crawl with initial seedlist and crawler configuration from January 2015.
Wide Crawl started January 2012
collection
30,362
ITEMS
258.6M
VIEWS
collection
eye 258.6M
Web wide crawl with initial seedlist and crawler configuration from January 2012 using HQ software.
Wide Crawl started April 2012
collection
39,252
ITEMS
228.9M
VIEWS
collection
eye 228.9M
Web wide crawl with initial seedlist and crawler configuration from April 2012.
Wide Crawl Number 14 started March 2016
collection
71,730
ITEMS
209.4M
VIEWS
collection
eye 209.4M
Web wide crawl.
Wide Crawl started October 2010
collection
15,839
ITEMS
167.1M
VIEWS
collection
eye 167.1M
Web wide crawl with initial seedlist and crawler configuration from October 2010
Wide Crawl Started January 2013
collection
15,138
ITEMS
164.2M
VIEWS
collection
eye 164.2M
Wide crawls of the Internet conducted by Internet Archive. Access to content is restricted. Please visit the Wayback Machine to explore archived web sites.
Wide Crawl started September 2012
collection
22,402
ITEMS
163.9M
VIEWS
collection
eye 163.9M
Web wide crawl with initial seedlist and crawler configuration from September 2012.
Wide Crawl Number 13
collection
46,049
ITEMS
154M
VIEWS
collection
eye 154M
Web Wide Crawl Number 13
Wide Crawl started October 2011
collection
12,648
ITEMS
147M
VIEWS
collection
eye 147M
Web wide crawl with initial seedlist and crawler configuration from March 2011 using HQ software.
Wide Crawl started February 2014
collection
9,789
ITEMS
136.2M
VIEWS
collection
eye 136.2M
Web wide crawl with initial seedlist and crawler configuration from February 2014.
Wide Crawl started March 2011
collection
8,528
ITEMS
132.5M
VIEWS
collection
eye 132.5M
Web wide crawl with initial seedlist and crawler configuration from March 2011. This uses the new HQ software for distributed crawling by Kenji Nagahashi. What’s in the data set: Crawl start date: 09 March, 2011 Crawl end date: 23 December, 2011 Number of captures: 2,713,676,341 Number of unique URLs: 2,273,840,159 Number of hosts: 29,032,069 The seed list for this crawl was a list of Alexa’s top 1 million web sites, retrieved close to the crawl start date. We used Heritrix (3.1.1-SNAPSHOT)...
Wide Crawl Number 15
collection
22,557
ITEMS
23.4M
VIEWS
collection
eye 23.4M
Web wide crawl.
Wide Crawl started September 2010
collection
332
ITEMS
4.6M
VIEWS
collection
eye 4.6M
Web wide crawl with initial seedlist and crawler configuration from September 2010
Wide Crawl started January 2012
web
eye 4.2M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl423.us.archive.org:wide from Tue Jan 17 08:02:53 PST 2012 to Tue Jan 17 01:16:20 PST 2012.
Topic: crawldata
Wide Crawl started February 2014
web
eye 2M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl453.us.archive.org:wide from Wed Feb 19 01:09:37 PST 2014 to Tue Feb 18 21:33:27 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web
eye 2M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl454.us.archive.org:wide from Wed Feb 19 05:20:19 PST 2014 to Wed Feb 19 01:54:33 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web
eye 2M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl420.us.archive.org:wide from Tue Feb 18 17:01:58 PST 2014 to Tue Feb 18 13:14:06 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web
eye 1.9M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl427.us.archive.org:wide from Wed Feb 19 09:49:01 PST 2014 to Wed Feb 19 06:07:15 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web
eye 1.9M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Wed Feb 19 08:18:23 PST 2014 to Wed Feb 19 04:21:37 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web
eye 1.9M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Tue Feb 18 22:58:46 PST 2014 to Tue Feb 18 19:25:19 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web
eye 1.9M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Wed Feb 19 07:58:38 PST 2014 to Wed Feb 19 05:13:46 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web
eye 1.8M
favorite 1
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl416.us.archive.org:wide from Sat Feb 8 03:46:42 PST 2014 to Fri Feb 7 23:17:16 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web
eye 1.8M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl414.us.archive.org:wide from Sat Feb 8 04:46:28 PST 2014 to Sat Feb 8 00:01:23 PST 2014.
Topic: crawldata
Wide Crawl started January 2012
web
eye 1.6M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Sat Jan 21 04:01:50 PST 2012 to Fri Jan 20 21:01:34 PST 2012.
Topic: crawldata
Wide Crawl started April 2013
web
eye 1.4M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl415.us.archive.org:wide from Wed May 15 12:25:51 PDT 2013 to Wed May 15 06:56:55 PDT 2013.
Topic: crawldata
Wide Crawl started February 2014
web
eye 1.1M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl338.us.archive.org:wide from Sat Feb 22 06:42:19 PST 2014 to Sat Feb 22 01:03:35 PST 2014.
Topic: crawldata
Wide Crawl started June 2014
web
eye 969,587
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl424.us.archive.org:wide from Tue Jul 1 13:59:42 PDT 2014 to Tue Jul 1 08:25:02 PDT 2014.
Topic: crawldata
Wide Crawl started April 2013
web
eye 924,563
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl424.us.archive.org:wide from Wed Jul 24 23:32:46 PDT 2013 to Wed Jul 24 18:16:50 PDT 2013.
Topic: crawldata
Wide Crawl started February 2014
web
eye 913,742
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl417.us.archive.org:wide from Sat Feb 22 09:02:42 PST 2014 to Sat Feb 22 06:09:28 PST 2014.
Topic: crawldata
Wide Crawl started October 2010
web
eye 787,321
favorite 0
comment 0
Internet Archive crawldata from all sites, captured by ia360919.us.archive.org:wide from Fri Sep 24 20:27:19 UTC 2010 to Sat Sep 25 04:26:09 UTC 2010.
Topic: crawldata
Wide Crawl started April 2013
web
eye 622,850
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl417.us.archive.org:wide from Wed May 15 12:43:53 PDT 2013 to Wed May 15 07:09:54 PDT 2013.
Topic: crawldata
Wide Crawl started April 2013
web
eye 617,965
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl415.us.archive.org:wide from Wed May 15 13:19:48 PDT 2013 to Wed May 15 07:48:12 PDT 2013.
Topic: crawldata
Wide Crawl started February 2014
web
eye 600,061
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Mon Feb 10 13:27:36 PST 2014 to Mon Feb 10 08:27:12 PST 2014.
Topic: crawldata
Host Screen Captures
collection
15,437
ITEMS
591,069
VIEWS
collection
eye 591,069
Screen captures of hosts discovered during wide crawls. This data is currently not publicly accessible.
Wide Crawl started September 2012
web
eye 527,729
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl337.us.archive.org:wide from Wed Oct 17 08:14:47 PDT 2012 to Wed Oct 17 02:41:59 PDT 2012.
Topic: crawldata
Wide Crawl started April 2013
web
eye 506,898
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl418.us.archive.org:wide from Wed Jun 26 16:22:29 PDT 2013 to Wed Jun 26 11:29:54 PDT 2013.
Topic: crawldata
Wide Crawl started April 2013
web
eye 465,299
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Sun May 12 11:51:10 PDT 2013 to Sun May 12 06:15:36 PDT 2013.
Topic: crawldata
Wide Crawl started April 2013
web
eye 447,686
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl419.us.archive.org:wide from Wed May 15 09:24:17 PDT 2013 to Wed May 15 04:10:53 PDT 2013.
Topic: crawldata
Wide Crawl started April 2013
web
eye 447,278
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl417.us.archive.org:wide from Wed May 15 09:49:58 PDT 2013 to Wed May 15 04:09:39 PDT 2013.
Topic: crawldata
Wide Crawl started April 2013
web
eye 436,366
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl418.us.archive.org:wide from Sun May 12 01:57:26 PDT 2013 to Sat May 11 20:51:14 PDT 2013.
Topic: crawldata
Wide Crawl started April 2013
web
eye 436,236
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl418.us.archive.org:wide from Sat May 11 23:29:50 PDT 2013 to Sat May 11 19:10:10 PDT 2013.
Topic: crawldata
Wide Crawl started January 2012
web
eye 434,452
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl420.us.archive.org:wide from Wed Jan 11 15:00:58 PST 2012 to Wed Jan 11 07:47:03 PST 2012.
Topic: crawldata
Wide Crawl started September 2012
web
eye 433,770
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl339.us.archive.org:wide from Fri Oct 19 01:17:49 PDT 2012 to Thu Oct 18 21:38:24 PDT 2012.
Topic: crawldata
Wide Crawl started April 2013
web
eye 431,954
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl419.us.archive.org:wide from Wed May 15 12:34:32 PDT 2013 to Wed May 15 07:17:30 PDT 2013.
Topic: crawldata
Wide Crawl started April 2013
web
eye 431,025
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl416.us.archive.org:wide from Sat May 11 23:54:41 PDT 2013 to Sat May 11 18:21:46 PDT 2013.
Topic: crawldata
Wide Crawl started January 2012
web
eye 429,290
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 18:18:33 PST 2012 to Thu Jan 5 11:30:47 PST 2012.
Topic: crawldata
Wide Crawl started April 2013
web
eye 424,843
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl419.us.archive.org:wide from Wed May 15 13:33:28 PDT 2013 to Wed May 15 07:56:18 PDT 2013.
Topic: crawldata
Wide Crawl Number 12 - started March, 14th 2015
web
eye 423,034
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl803.us.archive.org:wide from Mon Mar 23 07:02:14 PDT 2015 to Mon Mar 23 01:21:52 PDT 2015.
Topic: crawldata
Wide Crawl Number 12 - started March, 14th 2015
web
eye 422,778
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl451.us.archive.org:wide from Sun Mar 15 22:59:32 PDT 2015 to Sun Mar 15 17:21:54 PDT 2015.
Topic: crawldata
Wide Crawl started January 2012
web
eye 416,636
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 20:28:12 PST 2012 to Thu Jan 5 13:25:29 PST 2012.
Topic: crawldata
Wide Crawl started October 2010
web
eye 416,183
favorite 0
comment 0
Internet Archive crawldata from all sites, captured by ia360927.us.archive.org:wide from Sun Nov 21 08:17:09 UTC 2010 to Sun Nov 21 10:11:30 UTC 2010.
Topic: crawldata
Wide Crawl started October 2010
web
eye 413,890
favorite 0
comment 0
Internet Archive crawldata from all sites, captured by ia360927.us.archive.org:wide from Sun Nov 21 09:30:00 UTC 2010 to Sun Nov 21 11:15:41 UTC 2010.
Topic: crawldata
Wide Crawl started March 2011
web
eye 413,851
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl415.us.archive.org:wide from Sun Aug 14 03:13:23 PDT 2011 to Sat Aug 13 22:20:18 PDT 2011.
Topic: crawldata
Wide Crawl Started January 2013
web
eye 390,693
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl339.us.archive.org:wide from Wed Jan 30 00:37:31 PST 2013 to Tue Jan 29 21:10:11 PST 2013.
Topic: crawldata
Wide Crawl started January 2012
web
eye 381,039
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 16:52:35 PST 2012 to Thu Jan 5 10:17:00 PST 2012.
Topic: crawldata
Wide Crawl started February 2014
web
eye 367,072
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl422.us.archive.org:wide from Sat Feb 8 14:04:44 PST 2014 to Sat Feb 8 09:51:54 PST 2014.
Topic: crawldata
Wide Crawl started January 2012
web
eye 365,490
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl427.us.archive.org:wide from Tue Apr 17 00:58:46 PDT 2012 to Mon Apr 16 20:37:31 PDT 2012.
Topic: crawldata
Wide Crawl started September 2012
web
eye 354,518
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl410.us.archive.org:wide from Wed Nov 7 02:28:32 PST 2012 to Tue Nov 6 20:50:40 PST 2012.
Topic: crawldata
Wide Crawl Started January 2013
web
eye 350,336
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl336.us.archive.org:wide from Fri Apr 12 05:44:44 PDT 2013 to Fri Apr 12 01:55:44 PDT 2013.
Topic: crawldata
Wide Crawl started January 2012
web
eye 344,590
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 15:53:45 PST 2012 to Thu Jan 5 08:51:54 PST 2012.
Topic: crawldata
Wide Crawl started October 2010
web
eye 342,474
favorite 0
comment 0
Internet Archive crawldata from all sites, captured by ia360919.us.archive.org:wide from Sat Sep 25 20:13:06 UTC 2010 to Sun Sep 26 06:01:07 UTC 2010.
Topic: crawldata
Wide Crawl started February 2014
web
eye 332,164
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl416.us.archive.org:wide from Sat Feb 8 21:10:24 PST 2014 to Sat Feb 8 16:09:44 PST 2014.
Topic: crawldata
Wide Crawl started October 2010
web
eye 318,992
favorite 0
comment 0
Internet Archive crawldata from all sites, captured by ia360905.us.archive.org:wide from Sat Dec 18 18:22:04 UTC 2010 to Sat Dec 18 23:01:04 UTC 2010.
Topic: crawldata
Wide Crawl started October 2010
web
eye 312,532
favorite 0
comment 0
Internet Archive crawldata from all sites, captured by ia360919.us.archive.org:wide from Fri Sep 24 19:56:41 UTC 2010 to Sat Sep 25 03:53:54 UTC 2010.
Topic: crawldata
Wide Crawl started October 2010
web
eye 309,330
favorite 0
comment 0
Internet Archive crawldata from all sites, captured by ia360905.us.archive.org:wide from Sat Dec 18 19:50:25 UTC 2010 to Sat Dec 18 23:32:37 UTC 2010.
Topic: crawldata
Wide Crawl started April 2012
web
eye 302,303
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl422.us.archive.org:wide from Fri Jun 8 07:28:07 PDT 2012 to Fri Jun 8 02:31:40 PDT 2012.
Topic: crawldata
Wide Crawl started April 2013
web
eye 299,124
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Sat Apr 20 06:45:23 PDT 2013 to Sat Apr 20 00:39:11 PDT 2013.
Topic: crawldata
Wide Crawl started April 2013
web
eye 296,140
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl414.us.archive.org:wide from Sat Apr 20 06:45:31 PDT 2013 to Sat Apr 20 00:36:02 PDT 2013.
Topic: crawldata
Wide Crawl started August 2013
web
eye 295,989
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl422.us.archive.org:wide from Sun Aug 11 01:39:39 PDT 2013 to Sat Aug 10 21:58:44 PDT 2013.
Topic: crawldata
Wide Crawl started January 2012
web
eye 289,748
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl413.us.archive.org:wide from Thu Jan 5 19:32:22 PST 2012 to Thu Jan 5 12:20:54 PST 2012.
Topic: crawldata
Wide Crawl started January 2012
web
eye 288,579
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl336.us.archive.org:wide from Wed Jan 18 17:01:06 PST 2012 to Wed Jan 18 09:17:47 PST 2012.
Topic: crawldata
Wide Crawl started June 2014
web
eye 287,564
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl414.us.archive.org:wide from Wed Jun 25 20:16:46 PDT 2014 to Wed Jun 25 15:54:50 PDT 2014.
Topic: crawldata